Flat Topic List
This is an unsorted list of all possible topics, see Topics (Summer 2024) for a curated overview.
Application-Specific Conflict Resolution
In decentralized systems, conflicts are commonplace. They can arise whenever one peer performs an operation that invalidates some concurrent operation by another peer. There exist a multitude of different conflict resolution strategies: For example, we can simply keep the conflict and let the users fix it (git), we can prevent the conflict from happening by enforcing a consensus between the peers, or we can define an application-specific strategy that resolves conflicts (e.g. adds win over removes).
Most distributed applications pick one conflict resolution strategy for the whole application, which is usually determined by the application's consistency level. However, such a global strategy is often too permissive (leading to concurrency bugs) or too restrictive (slowing down the application and preventing offline availability). Ideally, we would pick just the right strategy for each part of the application being permissive where we can and being strict where needed.
As part of this topic, you will analyse different conflict-resolution strategies and evaluate them in the context of local-first software.
- Blazes: Coordination analysis for distributed programs
- Replicated Data Consistency Explained Through Baseball
- Rich-CRDTs
- Tackling Consistency-related Design Challenges of Distributed Data-Intensive Systems - An Action Research Study
- A Classification of Replicated Data for the Design of Eventually Consistent Domain Models
Mergeable Replicated Data Types
This is a reproduction project of an existing paper, where the goal is to reproduce and confirm or deny the results of the original paper.
MRDTs propose a replication strategy where existing data structures are converted to a universal set representation, which can be automatically replicated. The claims are that this is simpler and faster than the alternatives, but is this true?
- Mergeable replicated data types
- Bolt on Convergence for MRDTs
- Certified mergeable replicated data types
On CRDTs in Byzantine Environments
In a Byzantine peer-to-peer system, other nodes are actively trying to attack the system. It has been noted by Martin Kleppmann, that replicated data types are already well positioned to deal with such attacks, by recording a history of all operations, and protecting the history with cryptographic checksums. This is the concept underlying the Matrix protocol – so the concept works – but does it generalize to other replicated data types?
Testing Local-First Software
Testing software becomes much harder in a local-first setting, because it is insufficient to test good/bad results of operations in isolation, but one must also consider concurrent execution of other operations. Moreover, for many local-first applications, it is not even clear what a “correct” result is. For example, if two users concurrently add three new item each on their shared shopping list, which of the possible resulting lists are correct?
A promising strategy is “property based testing”, where developers define abstract properties that should always be true (such as “the shopping list should contain each added item but no duplicates”), while the testing system automatically generates example executions and then checks the defined properties. However, for effective testing, the generated example executions should exercise many potential problems, which is still a mostly unsolved problem.
- Model-checking-driven explorative testing of CRDT designs and implementations
- Effective Concurrency Testing for Distributed Systems
- Trace aware random testing for distributed systems
Verification of Local-First Software
Verification of local-first software (and distributed software in general) is challenging because it is insufficient to only reason about the local behaviour of a program. Instead, we have to account for operations that can happen concurrently on multiple devices. Existing approaches employ advanced type systems, deductive program verification, model checking or combinations thereof.
In this topic, you will explore the current state of verification approaches in the local first-domain (seminar) or implement your own/extend existing ones (project).
- LoRe: A Programming Model for Verifiably Safe Local-first Software
- Pretend synchrony: synchronous verification of asynchronous distributed programs
- Sound sequentialization for concurrent program verification
- Dynamic Partial Order Reduction for Checking Correctness against Transaction Isolation Levels
- AMC: Towards Trustworthy and Explorable CRDT Applications with the Automerge Model Checker
- Disciplined Inconsistency with Consistency Types
Version Conflicts and Schema Evolution in Decentralized Systems
Like every modern application, decentralized applications get updated from time to time. These updates sometimes introduce breaking changes, e.g., when they change/drop certain APIs or when they change the way messages in the systems are formatted.
This poses an interesting problem that is sometimes referred to as schema migration/evolution: How do we deal with the fact that different versions of the same application might co-exist in a running distributed system? Maybe some participants have already updated their app while others have not.
If we want to prevent errors due to version conflicts, we have to develop distributed applications that are ready to deal with data from the "past", but also with possible updates from the "future".
In this topic you will research and survey different strategies for dealing with schema evolution such as edit lenses (seminar) or you will try them out/implement your own (project).
- Live & Local Schema Change: Challenge Problems
- Cambria: Schema Evolution in Distributed Systems with Edit Lenses
Technique: Streams and Corecursion
Programs usually operate over data structures that have a finite size. For example, arrays have a definite end. On the other hand, codata refers to potentially infinite data structures. The simplest example are streams, which are sequences that can go on forever. Programming with codata introduces new challenges—for example, trying to calculate the sum of a stream may lead to a non-terminating computation—but it also enables interesting applications and techniques.
Exact Arithemtic With Digit StreamsExact real arithmetic allows us to perform numerical computations to arbitrary precision. This contrasts with floating point numbers, which can produce drastically imprecise results in some cases. In functional programming courses you have probably encountered Peano arithmetic (with Zero and Succ) as one of the fundamental examples for pattern matching and functional programming. Similarily, a digit stream is for example a stream of digits like 0.9999999999.... and arithemtic on digit streams can be considered a fundamental example of working with streams. The idea behind exact real arithmetic is that while we cannot directly represent real numbers, as that would require infinite memory, we can produce streams of better and better approximations and stop when the result is precise enough for us.
Individiual challenges can be to implement
- Interval Arithmetic & Midpoint Arithmetic
- Rational Numbers and Dyadic Rational Numbers
- Shrinking Intervals & Signed Digit Streams
- Eudoxus Numbers
- Algebraic Numbers
- Dedekind vs Cauchy Numbers
- Floats vs Posits.
Many real-world systems continually receive new data which they need to transform, analyze and aggregate. Frameworks like Apache Flink have adopted data streams as the underlying abstraction for implementing such systems.
Stream CompilersIdeally, a stream computation framework allows programmers to use a high-level, declarative interface, while producing highly efficient code. Techniques from functional programming like higher-order abstract syntax, intrinsically typed embeddings, and smart constructors might be useful for implementing such a framework.
- Finally tagless, partially evaluated: Tagless staged interpreters for simpler typed languages
- Bottom-up rewriting with smart constructors, hereditary substitution & normalization by evaluation
Application: Incremental Programming
It it often a good idea to avoid redoing expensive computations. As a first approach, we can cache the result of a function call with a certain value, so that we can just return the cached results when the function is called with the same value again. However, when the input changes, then, no matter how small the change, it seems that we would need to perform the entire computation from scratch. The field of incremental computation studies how we can construct programs that reuse already computed results when the data changes. One application are databases: If you have defined an expensive query on your data, you don't want to redo the entire query once some new data is added.
- Differential Dataflow
- Incremental λ-Calculus in Cache-Transfer Style
- DBSP: Automatic Incremental View Maintenance for Rich Query Languages
Technique: Programs are Proofs
"In 1934 Curry observed that the types of combinatory logic can be seen as axioms of intuitionistic logic. In 1969 Howard observed that natural deduction from proof theory forms a model for the lambda calculus. More informally, the return type of a function is analogous to a logical theorem, subject to hypotheses corresponding to the types of the argument; and that the program to compute that function is analogous to a proof of that theorem. This sets a form of logic programming on a rigorous foundation: proofs can be represented as programs, and especially as lambda terms, or proofs can be run." https://en.wikipedia.org/wiki/Curry%E2%80%93Howard_correspondence
However, history didnt stop 1969. Up until this day exciting things can be explained and discovered by exploiting proofs that can run and give results, such as:
- Intrinsic Proofs
- Well-typed & Well-scoped interpreters
- Parametricity translation
- Logical Relations
- Type-and-Effect System
- Type-and-Coeffect System
Some recent papers we find interesting are:
- A correct-by-construction conversion from lambda calculus to combinatory logic
- On the Versatility of Open Logical Relations
- Differential Logical Relations. Part II: Increments and Derivatives
- Homotopical Patch Theory
Application: Automated Programming
The goal of automated programming is to accelerate software development by (semi-)automating the process of writing code and tests.
Property-Based testingIn property-based testing, a property such as "sorting a list does not change the length of a list" are stated that should be correct for a sorting function, and then the function is tested with randomly generated lists to see whether a counter example can be found that invalidates the property to make the test fail, otherwise the test is passed.
In a project, we could investigate and implement some procedure for automated generation of data following some techniques from a paper. For the seminar, a few recent papers on this topic would be the following -- depending on your interest we might find older or more recent other papers as well:
Program SynthesisThe goal of program synthesis is to turn a high-level specification of a problem into a program, ideally an efficient one. This usually involves searching through the (very large) space of possible programs to find the ones that satisfy our requirements. One approach is to turn the search problem into a set of logical formulas. These are then passed to an off-the-shelf SMT (Satisfiability-Modulo-Theories) solver, which (hopefully) produces a solution that satisfies the formulas. Searching through the space of programs can also be done through rewriting, where we replace part of a program with an equivalent expression. A data structure called E-Graph allows storing many similar terms in a compact fashion, making the search, also called equality saturation, more tractable.
Recently, efficient frameworks for equality saturation have been developed in Rust and Julia. These frameworks also support program analysis, where properties of the programs are inferred to aid with rewriting. Equality saturation has shown promise in making floating point computations more numerically stable and in optimizing linear algebra programs.
- Growing solver-aided languages with Rosette
- Better Together: Unifying Datalog and Equality Saturation
- babble: Learning Better Abstractions with E-Graphs and Anti-Unification
- Metatheory. jl: Fast and elegant algebraic computation in Julia with extensible equality saturation
- Optimizing Recursive Queries with Program Synthesis
- Languages with Decidable Learning: A Meta-theorem
Application: Probabilistic Programming
Probabilistic programming is simply programming with probabilities. It can be used in the context of artificial intelligence or machine learning. Or more concretely, to perform random choices and then ask what the probability of certain results is: For example consider a program:
var x: Bool = flipCoin();
var y: Bool = flipCoin();
and then consider asking to infer the probability of x==y. (should be 50%)
In a project, we could investigate and develop some probabilistic language constructs following some paper. For the seminar, a few recent papers on this topic would be the following -- depending on your interest we might find older or more recent other papers as well:
- Reasoning about Probabilistic Programs
- Exact Recursive Probabilistic Programming
- Factor Graph Grammars
- Symbolic Disintegration with a Variety of Base Measures
- The Base Measure Problem and its Solution
Project ideas:
- Implementing a simple DSL with a Probabilistic Generating Functions semantics in Lean
- Implementing a simple DSL with a Straight Through Estimator semantics
Application: Choreographic Programming
"Choreographic programming is an emerging paradigm for programming distributed systems. In choreographic programming, the programmer describes the behavior of the entire system as a single, unified program -- a choreography -- which is then compiled to individual programs that run on each node, via a compilation step called endpoint projection."
- HasChor: Functional Choreographic Programming for All
- Modular Compilation for Higher-Order Functional Choreographies
Technique: Programming with Effects
Code often has a happy-path to the goal, but needs to consider a lot of error cases as well. These error cases deviate from the path and need to be handled, in software we can leverage several solutions differing in their power to similar problems:
- Error codes. A simple solution is to return an error code in case of an error and handle that.
- Exceptions. Exceptions enable a clean separation between the code that raises an exception and the code that handles it.
- Coroutines. Besides raising an exception and therefore stopping code, alternatively, a coroutine can suspend execution with a question and other code can resume the coroutine with an answer.
- Continuations. Further, it might be possible to create a copy of the current program execution state, resume execution and then later backtrack and continue execution with a different value.
At some point we no longer consider the exceptions as "side"-effects, e.g., things which went wrong, but instead we use the continuations as equally important parts of the primary goal of some functionality -- an effect. Effect handlers as generalizations of exception handlers offer interesting new ways to modularize software. Effects and co-effects can accurately represent additional capabilities and restrictions on parts of code.
In a project, we could learn how to use such an effect system and implement some case studies, or implement part of an effect compiler yourself following some papers. For the seminar, a few recent papers on this topic would be the following -- depending on your interest, we might find older or more recent other papers as well:
- High-Level Effect Handlers in C++
- ‘do’ unchained: embracing local imperativity in a purely functional language
- From Capabilities to Regions: Enabling Efficient Compilation of Lexical Effect Handlers
Application: Differentiable Programming
Training a neural network means optimizing the parameters which control its behavior, with respect to a loss function. The usually employed optimization algorithms, which are based on gradient descent, require computing the loss function’s gradient. This means that we need to differentiate the neural network. While this could in principle be done by hand, automatic differentiation (AD) allows computing the derivative of a given program without additional programming effort. As AD is not restricted to the spe- cific operations used by typical neural networks, more complex constructs, for example involving control flow, can be employed in machine learning, as long as the program remains differentiable and has trainable parameters. This approach, which generalizes from deep neural networks to a broader class of programs, has been called differentiable programming.
- Forward- or reverse-mode automatic differentiation: What’s the difference?
- You only linearize once: Tangents transpose to gradients
- Deep Learning with Parametric Lenses
- Reverse-Mode AD in a Functional Framework: Lambda the Ultimate Backpropagator
Technique: Programming with Modalities
Modal logic revolves around two logical operators, whose meaning is "it is necessary that X" and "it is possible that X", respectively. The principles governing these operators turn out to be useful for reasoning about other concepts as well, like knowledge (in epistemic logic) or the progress of time (in linear temporal logic).
In computing, we can interpret logical operators as type constructors and their proof rules as syntactical constructs in a programming language (see also Technique: Programs are Proofs). This view on modalities has a number of applications, from showing the absence of side effects to tracking resource usage and modelling event-driven programming.
- Idris 2: Quantitative Type Theory in Practice
- Recovering Purity with Comonads and Capabilities
- The Essence of Event-Driven Programming
- A judgmental reconstruction of modal logic
Engineering COPL-style language interpreters
Our course, concepts of programming languages (not required for this topic, but certainly helpful), uses interpreters to study how specific foundational language features work.
However, a realistic language implementation requires much more than what we have discussed in our interpreters – not in terms of supported language concepts, but rather in terms of functionality expected of the implementation. These may include:
- Parsing
- Error reporting
- Compilation to machine code
- Runtime optimizations
- Syntax highlighting
- Code formatting
- Refactorings
These days, one does not have to start from scratch when implementing language tooling. Solutions that will help might include:
- parsley parser combinators
- a language server protocol implementation
- GraalVM and the Truffle framework
- Web Assembly as a compilation target
- LLVM as a compilation target
- Treesitter language grammars
If you choose this project topic, we will work with you to figure out an interesting slice of the problems/techniques above to work on. Feel free to already propose a focus area if you have something that interests you. None of the above lists are meant to be exhaustive – though we do ask for solutions we could potentially integrate with the actual COPL interpreters.